3.8 Spatial joins

Section Contact: Emma Jones ()

Spatial joins are incredibly useful operations that allow users to link different spatial features together. If you have ever wondered if something falls inside something else, that’s a spatial join (or intersection, if we want to be completely honest with ourselves). If you have ever wanted to know information about feature y based on feature x, that’s a spatial join.

3.8.1 Case Study: Water Permit Review

A use case for using spatial joins in R is demonstrated through the process of reviewing permits in the Water Programs. During the permit review process, Water Planning staff should identify whether or not permits fall into TMDL watersheds. The following workflow documents how to identify whether or not a permitted outfall falls into a TMDL watershed.

3.8.1.1 Point Locations

First, we must create a dataset of all the point locations we want to compare against polygons (TMDL watersheds). An easy way to visually create a dataset in R (for demonstration purposes) is by using the tibble::tribble() function. We will create a tibble of 17 sites we want to investigate further. This might not be the most efficient way to bring data into your R environment for real data examples, so we advise reading in datasets from local files when possible.

library(tidyverse)

outfalls <- tribble(
  ~SiteID, ~Latitude, ~Longitude,
  "Site 1",  37.48516,  -79.166643,
  "Site 2",  37.489919, -79.228329,
  "Site 3", 37.048334,  -79.886112,
  "Site 4", 37.377499,  -79.558333,
  "Site 5", 37.211113,  -79.299446,
  "Site 6", 37.146388,  -79.619167,
  "Site 7", 37.363862,  -79.955802,
  "Site 8", 37.365318,  -79.956827,
  "Site 9", 37.24955,   -79.943387,
  "Site 10", 37.066667, -78.958333,
  "Site 11", 36.627502, -80.288611,
  "Site 12", 36.57995,  -79.428251,
  "Site 13", 36.579799, -79.427712,
  "Site 14", 36.783053, -80.003705,
  "Site 15", 36.951112, -79.351669,
  "Site 16", 36.99861,  -80.723612,
  "Site 17", 37.335081, -80.757731)

Now convert the outfalls object into a spatial dataset using the sf package.

library(sf)

outfalls <- st_as_sf(outfalls, 
                     coords = c('Longitude', 'Latitude'), 
                     remove = F, # don't remove these lat/lon cols from dataset
                     crs = 4326) # EPSG 4326 = WGS84 coordinate reference system

Let’s look at these points before proceeding to see if there is anything special about these points?

library(leaflet)
library(inlmisc)

CreateWebMap(maps = c("Topo","Imagery","Hydrography"), collapsed = TRUE) %>%
  addCircleMarkers(data = outfalls, 
              color = 'black',
              fillColor = 'blue', 
              fillOpacity = 0.6,
              stroke=5,
              label = ~SiteID )

These stations all appear to be in the Blue Ridge Regional Office region, so we will use this to dial further data queries.

3.8.2 Polygons of Interest

We know we need to compare these points to TMDL watersheds, so let’s use the latest published version of this dataset that is available on the internal GIS REST service. See the Consuming GIS REST Services in R section to learn more about querying the GIS REST service.

The below chunk creates a spatial object of all the TMDL watersheds in the state and creates a map of all the returned watersheds overlayed by the outfall points from above.

library(httr)

url <- parse_url("https://gis.deq.virginia.gov/arcgis/rest/services")
url$path <- paste(url$path, "staff/DEQInternalDataViewer/MapServer/72/query", sep = "/")
url$query <- list(where = "REGION = 'WCRO'", #"OBJECTID_1 > 0",
                  outFields = "*",
                  returnGeometry = "true",
                  f = "geojson")
request <- build_url(url)
# create a spatial dataset based on the REST query
TMDLwatersheds <- st_read(request)

# map returned watersheds and outfall points
CreateWebMap(maps = c("Topo","Imagery","Hydrography"), collapsed = TRUE) %>%
  addPolygons(data = TMDLwatersheds, 
              color = 'gray',
              fillColor = 'gray', 
              fillOpacity = 0.6,
              stroke=5,
              label = ~PJ_NAME ) %>% 
  addCircleMarkers(data = outfalls, 
              color = 'black',
              fillColor = 'blue', 
              fillOpacity = 0.6,
              stroke=5,
              label = ~SiteID )

3.8.2.1 Spatial Join

So far we have identified that the outfall point do fall within our TMDL watersheds, but how do we know which outfalls are contained within specific TMDL watersheds? Enter the spatial join. We will use the outfalls as our left hand join argument to join to our TMDL watershed layer in order to extract the TMDL watershed information relative to each outfall. Said in a more general way, we will use the points as our left hand join argument to join to our polygon layer in order to extract the polygon information relative to each point.

outfallsInTMDL <- st_join(outfalls, TMDLwatersheds)

Wait, what?

It looks like the joining step ran into an error. Time to figure out what that error message means in order to solve the problem.

So after a bit of poking around the internet, it seems this error is generated when there are geometry issues with one of the two spatial datasets involved in the join. Polygon features are usually to blame for geometry issues. Let’s start there with our troubleshooting.

Let’s try and repair the geometry issues in the local version of the TMDLwatersheds data using sf’s built in geometry repair function st_make_valid().

TMDLwatersheds_fixedGeometry <- st_make_valid(TMDLwatersheds)

Now let’s try the join again and see if we can get join to work properly.

outfallsInTMDL <- st_join(outfalls, TMDLwatersheds_fixedGeometry)

Huzzah! It worked. Let’s see what the returned data look like in tabular form.

library(DT)

datatable(outfallsInTMDL %>%  st_drop_geometry()) # drop spatial component for the HTML table

Looks good! We have each point that fell into a polygon attached to the metadata (or attribute information) associated with each polygon. Note: The sites with multiple rows indicate that the point falls into multiple overlapping polygons.

Let’s do a quick summary to understand this data:

outfallsInTMDL %>% 
  st_drop_geometry() %>%  #drop spatial part for this analysis
  group_by(SiteID) %>% 
  summarise(`n Projects` = n(),
            `Project Names` = paste(unique(PJ_NAME), collapse = ', '),
            `Project Types` = paste(unique(IMP_NAME), collapse = ', '))
## # A tibble: 17 x 4
##    SiteID  `n Projects` `Project Names`              `Project Types`            
##    <chr>          <int> <chr>                        <chr>                      
##  1 Site 1             1 James River - Lynchburg      Escherichia coli           
##  2 Site 10            1 Falling River Watershed      Escherichia coli           
##  3 Site 11            2 South Mayo River Watershed,~ Escherichia coli           
##  4 Site 12            1 Dan River Watershed          Escherichia coli           
##  5 Site 13            1 Dan River Watershed          Escherichia coli           
##  6 Site 14            2 Dan River Watershed, Smith ~ Escherichia coli, Benthic-~
##  7 Site 15            1 NA                           NA                         
##  8 Site 16            1 NA                           NA                         
##  9 Site 17            1 New River PCB                PCB in Fish Tissue         
## 10 Site 2             1 James River - Lynchburg      Escherichia coli           
## 11 Site 3             3 Roanoke (Staunton) River Wa~ Polychlorinated biphenyls,~
## 12 Site 4             4 Little Otter River, Johns C~ Benthic-Macroinvertebrate ~
## 13 Site 5             2 Big Otter River Watershed, ~ Total Fecal Coliform, Poly~
## 14 Site 6             2 Roanoke (Staunton) River Wa~ Polychlorinated biphenyls,~
## 15 Site 7             3 Tinker Creek Watershed, Upp~ Escherichia coli, Benthic-~
## 16 Site 8             3 Tinker Creek Watershed, Upp~ Escherichia coli, Benthic-~
## 17 Site 9             3 Roanoke (Staunton) River Wa~ Polychlorinated biphenyls,~

Depending on how the user needs to share the results of the analysis, the data can be saved in tabular or spatial data formats. The chunk below shows how to save the results in two different formats.

# save as a csv
write.csv(outfallsInTMDL %>% st_drop_geometry(), 'outfallsInTMDL.csv', na = '', row.names = F) # don't include geometry in tabular output

# save as a shapefile
st_write(outfallsInTMDL, 'outfallsInTMDL.shp')